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SUMMARY 


Current  methodology  for  the  analysis  of  categorical  data  can  be  traced  to  two  papers 
published  in  1900,  one  by  Pearson  and  one  by  Yule.  The  keys  to  the  linking  of  their  ideas 
have  been  the  use  of  linear  models  and  inferential  tools  due  to  Fisher.  After  50  years  of 
research,  statisticians  have  come  close  to  developing  a  comprehensive  approach  to  categorical 
data  problems  that  stresses  three  basic  themes:  interpretability,  flexibility,  and  computability. 
This  paper  surveys  the  evolution  of  this  comprehensive  approach  and  classes  of  problems  for 
which  it  has  proven  useful.  A  concluding  section  contains  some  speculation  on  unsolved 
methodological  problems  of  current  interest  and  on  future  developments. 
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1.  Introduction 


As  recently  as  the  late  1960's,  the  perception  of  most  statistics  students  and  users  of  statistical 
methods  was  that  the  analysis  of  categorical  data  consisted  primarily  of  topics  such  as  2x2 
tables  and  Fisher’s  exact  test,  chi-square  tests  for  goodness-of-fit,  and  examining  the  models  of 
independence  or  homogeneity  of  proportions  in  two-way  contingency  tables.  The  reality  was 
that  by  1965  the  statistical  literature  contained  over  150  papers  on  methodology  for  contingency 
table  analysis  including  techniques  for  the  analysis  of  multi-way  tables  using  loglinear  models 
(e.g.  see  the  partial  bibliography  in  Kastenbaum,  1970,  which  goes  up  to  1965,  and  the 
subsequent  bibliography  through  1974  by  Killion  and  Zahn,  1976).  The  intervening  15  years 
have  seen  a  dramatic  change  in  both  the  perception  and  the  reality.  The  development  and 
elaboration  of  the  loglinear  model  approach  to  categorical  data  analysis  has  led  to  the 
publication  of  at  least  a  dozen  books  and  monographs  on  the  topic  (for  a  partial  list  see 
Fienberg,  1982a),  and  the  basic  ideas  on  the  use  of  loglinear  models  for  multi-way  arrays  now 
appear  in  many  textbooks  on  statistical  methodology  beside  material  on  multiple  regression  and 
ANOVA.  whose  linear  heritage  they  share. 

The  key  features  of  what  is  viewed  by  many  as  a  comprehensive  approach  to  the  analysis  of 
categorical  data  can  be  traced  back  to  two  unrelated  papers,  both  of  which  appeared  in  1900. 
In  one  of  these  papers.  Pearson  (1900)  proposed  the  chi-square  test  for  comparing  observed 
and  expected  frequencies,  and  derived  its  asymptotic  distribution  when  the  parameters 
underlying  the  expected  frequencies  are  known  a  priori  (for  further  details  and  a  discussion, 
see  Plackett.  1983).  This  result,  as  amplified  by  Fisher  (1922)  to  adjust  the  degrees  of  freedom 
(d.f.)  for  the  estimation  of  parameters,  forms  the  basis  of  the  usual  asymptotic  theory  used  to 
check  on  the  goodness-of-fit  of  loglinear  and  other  models.  In  the  other  paper.  Yule  (1900) 
described  the  structural  relationship  among  categorical  variables  by  means  of  functions  of 
cross-product  or  odds  ratios.  In  particular  he  developed  a  general  notation  for  2"  contingency 
tables  and  the  concepts  of  partial  and  joint  association  for  dichotomous  variables.  Fisher 
(1922)  did  pull  these  ideas  together  for  IXJ  contingency  tables,  showing  that  the  chi-square  test 
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for  independence  had  (I-1)(J-1)  d.f.  This  is  roughly  where  the  development  of  categorical  data 
analysis  stood  when  Fisher  first  visited  Iowa  State  Uni  verity  in  the  summer  of  1931.  just 
before  the  founding  of  the  Statistical  Laboratory.  (It  was  on  this  occasion  that  Fisher  learned 
that  A.E.  Brandt  had  developed  a  formula  for  computing  the  chi-square  statistic  in  a  special 
case,  and  Fisher  incorporated  into  one  of  his  lectures  at  Ames  (Box.  1978).) 

Over  the  years,  there  has  been  considerable  interest  in  the  analysis  of  categorical  data  at  Iowa 
State  University.  Snedecor  (1937)  included  material  on  it  in  the  first  edition  of  Statistical 
Methods .  and  Cochran  (1940.  1942)  wrote  on  the  topic  during  his  ISU  years.  Later  editions  of 
Statistical  Methods  incorporated  Bartlett’s  work  on  2X2X2  tables  although  Snedecor’s  (1958) 
paper  incorrectly  noted  that  Bartlett’s  test  for  no-second-order  interaction  and  a  test  proposed 
by  Lancaster  are  "asymptotically  equal."  Other  ISU  faculty  and  graduates  who  have  made 
methodological  contributions  to  the  topic  include  R.L.  Anderson,  K.  Hinkelman, 
O.  Kempthome,  and  K.  Koehler. 

The  next  section  of  this  paper  gives  a  brief  historical  review  of  the  development  of  ideas  on 
loglinear  models  and  their  use  in  the  analysis  of  categorical  data  over  the  past  50  years.  Then 
in  Section  3,  we  describe  the  use  of  loglinear  models  for  contingency  tables,  stressing  alternate 
representations  of  the  models  and  their  interpretations.  In  Section  4  we  indicate  how  loglinear 
models  have  been  adapted  to  other  forms  of  categorical  data  analysis,  and  the  links  between 
these  new  methods  and  loglinear  models  for  multi-way  contingency  tables  that  facilitate 
computation  of  parameter  estimation.  We  conclude  the  paper  with  some  speculation  on 
unsolved  methodological  problems  of  current  interest  and  on  future  developments. 

No  single  approach  can  ever  be  expected  to  be  the  only  sensible  one  for  a  broad  class  of 
statistical  problems  such  as  those  associated  with  the  analysis  of  categorical  data.  Yet  the 
interpretability  and  flexibility  of  the  loglinear  model  approach  and  the  computational  methods 
available  for  its  application  have  moved  us  towards  a  comprehensive  approach  to  the  analysis 
of  categorical  data. 
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2.  A  Brief  Review  of  Loglinear  Model  Developments 

The  literature  on  the  analysis  of  categorical  data  contains  hundreds  of  papers  authored  by 
many  of  statistics'  most  distinguished  researchers.  In  this  section,  we  trace  a  path  through  this 
literature  of  the  past  50  years  that  highlights  the  evolution  of  the  loglinear  model  and  its 
application.  This  brief  review  ignores  the  contributions  of  a  large  number  of  individuals  who 
focussed  primarily  on  other  forms  of  models,  methods  of  estimation  other  than  maximum 
likelihood,  and  issues  such  as  the  adequacy  of  large-sample  properties  of  test  statistics.  For  an 
alternative  review  and  a  discussion  of  nonstandard  applications  see  Imrey,  Koch,  Stokes,  et  al. 
(1981.  1982). 


Although  Yule  (1900)  focussed  on  the  cross-product  ratio  as  a  measure  of  association  in  2X2 
tables  and  developed  ideas  on  association  in  2n  tables,  35  years  passed  before  Bartlett  (1935) 
utilized  Yule’s  ideas  to  define  the  concept  of  second-order  interaction  in  2X2X2  tables.  For  a 
2X2  table  with  expected  values  (m  .).  Yule’s  cross-product  ratio  is: 
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Bartlett’s  no-second-order  interaction  model  for  the  expected  values  in  a  2X2X2  table 
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was  based  on  equating  the  values  of  a  in  each  layer  of  the  table,  i.e., 
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Bartlett  then  went  on  to  derive  maximum  likelihood  estimates  of  the  {m  )  by  solving  a  cubic 

ijk 


equation. 


It  was  not  for  another  20  years  that  Roy  and  Kastenbaum  (1956)  were  to  generalize  Bartlett’s 
approach  to  IXJXK  tables.  Their  method,  as  described  in  Kastenbaum  and  Lamphiear  (1959). 
for  solving  the  likelihood  equations  under  no-second-order  interaction  was  considered  to  be 


computationally  complex  (involving  an  iterative  solution  of  (I-1)(J-1)(K-1)  simultaneous  third- 
degree  equations),  and  neither  the  model  nor  the  method  was  easily  generalized  to  higher 
dimensions.  Indeed,  it  was  not  until  Birch  (1963)  converted  Bartlett's  and  Roy  and 
Kastenbaum’s  multiplicative  definition  of  no-second-order  interaction  to  an  additive  analysis- 
of -variance-like  model  in  the  logarithmic  scale  that  key  features  of  the  loglinear  model 

approach  to  multi-way  tables  emerged  (see  also  the  development  in  Good,  1963).  Birch  also 
presented  a  simple  yet  elegant  result  that  linked  the  basic  sampling  distributions  for 
contingency  tables  (Poisson  and  multinomial)  and  at  the  same  time  elucidated  the  relationship 
between  loglinear  and  logit  models.  What  remained  to  be  done  before  the  approach  could  be 
implemented  in  practice  was  to  come  up  with  a  simple  computational  technique  for  solving  the 
likelihood  equations. 

The  timing  was  propitious  because  iterative  techniques  that  involved  large  numbers  of 
computations  had  recently  become  a  reasonable  way  to  solve  maximization  problems  due  to  the 
availability  of  high-speed  computers.  While  working  on  the  National  Halothane  Study  in 

1965-66,  Bishop  rediscovered  an  iterative  procedure  proposed  for  a  related  categorical  data 
problem  by  Deming  and  Stephan  (1940).  Although  others  (e.g.  see  Darroch,  1962)  had 

proposed  equivalent  iterative  techniques  for  special  cases.  Bishop  (1967)  presented  a  relatively 
general  computer  program  implementing  the  Deming-Stephan  algorithm  and  showed  how  it  was 
applicable  for  solving  the  likelihood  equations  associated  with  the  class  of  loglinear  models 
described  by  Birch. 

Many  statisticians  were  now  focussing  on  loglinear  model  methods,  and  adapting  them  for  use 
in  connection  with  the  analysis  of  incomplete  contingency  tables,  Markov  chains,  and  other 
non-standard  problems.  Important  advances  were  made  by  authors  such  as  Bhapkar,  Bock, 
Darroch.  Goodman,  Haberman,  Kullback.  Plackett,  and  Nerlove  and  Press.  One  specific  line  of 
work,  initiated  by  Nelder  and  Wedderburn  (1972),  linked  the  analysis  of  categorical  data  using 
loglinear  and  logit  models  to  the  analysis  of  measurement  data  linear  models  with  normal 

errors  via  what  they  called  generalized  linear  models.  As  implemented  in  the  computer 


package  GLIM  (Baker  and  Nelder.  1978).  itais  approach  provided  additional  stimulus  for  the  use 
of  loglinear  models  and  presented  an  alternative  to  the  iterative  proportional  fitting  technique 
introduced  by  Bishop. 

The  research  work  of  the  1960's  treated  the  problems  associated  with  categorical  data  analysis 
using  loglinear  models  as  being  separate  from  those  involving  other  forms  of  linear  models  and 
sampling  distributions  other  than  Poisson  and  multinomial.  But  as  the  work  of  Nelder  and 
Wedderburn  showed,  these  separate  streams  of  research  could  be  linked.  The  key  to  the 
linkage  was  the  existence  of  general  results  on  exponential  families  and  their  sufficient  statistics 
that  originated  in  the  1930’s  with  Fisher  (e.g.  see  Dempster,  1971,  and  the  discussion  in 
Andersen.  1980).  From  the  perspective  of  exponential  family  theory  the  interpretation  of 
loglinear  models  was  even  closer  to  that  of  linear  models  than  the  parallel  notation  suggested. 
In  the  next  section,  we  describe  some  of  the  loglinear  model  results  that  are  part  of  this  more 
general  statistical  theory,  but  we  also  stress  special  aspects  of  the  interpretation  of  loglinear 
models  and  a  unique  loglinear /multinomial  result  due  to  Birch. 

3.  Loglinear  Models,  Contingency  Tables,  and  Likelihood  Theory 
A.  Notation  for  the  2X2  table 


It  has  been  suggested,  only  partially  in  jest,  that  virtually  all  important  statistical  ideas  can  be 
described  and  illustrated  in  the  context  of  the  2X2  contingency  table.  While  this  is  clearly  not 
the  case,  the  2X2  table  provides  a  useful  starting  place  for  a  discussion  of  loglinear  models. 


We  begin  by  denoting  the  observed  count  for  the  (i,j)  cell  of  a  2X2  contingency  table  by  x 

1.1 

and  the  totals  for  the  ith  row  and  jth  column  by  x  and  x  ,  respectively.  The  lx  1  are 

>♦  -j  ij 

typically  taken  to  be  realizations  of  random  variables  whose  expectations  we  denote  by  {m  }. 

<) 

These  expected  values  can  now  be  rewritten  in  loglinear  model  form  using  analysis  of  variance 
(ANOVA)  notation: 


log  m 

ij 
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Xu  =  Xu  =  Xu  =  Xu  =  0. 
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Although  the  model  is  in  a  general  form  to  be  applicable  to  Ixj  tables,  for  2X2  tables  there 


are  only  4  distinct  parameters:  u.  u  ,  u  ,  and  u 

J  ^  1(1)  2(1)  12111) 


The  3  subscripted  parameters  are 


expressible  as 
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We  note  that  u  is  simply  a  function  of  Yule's  cross-product  ratio,  a  =  m  m  /m  m  , 

1201)  11  22  12  21 

and  u  and  u  are  functions  of  similar  cross-product  ratios. 

1(1)  2(1) 

Setting  u  =  0  is  equivalent  to  setting  a  -  1  and  corresponds  to  independence  of  the 
variable  for  rows  and  the  variable  for  columns.  Thus  we  have  seen  two  special  features  of 
loglinear  models: 

(i)  all  subscripted  parameters  are  expressible  as  logarithms  of  cross-product  ratios 
or  functions  of  them. 

(ii)  setting  some  loglinear  model  parameters  equal  to  zero  often  leads  to  a  model 
which  can  be  interpreted  in  terms  of  independence  of  variables  underlying  the 
dimensions  of  the  table. 

These  features,  which  are  shared  by  loglinear  models  for  IXJ  and  multi-way  tables,  mean  that 
loglinear  models  can  be  interpreted  using  both  the  ANOVA-like  structure  or  generalizations  of 
cross-product  ratios  and  independence  concepts. 


The  use  of  ANOVA-like  notation  here  is  at  least  in  part  illusory,  however.  There  is  no 
response  variable  on  the  left-hand  side  of  equation  (3.1),  only  a  log-expected  count.  Thus  the 
u-term  parameters  really  cannot  be  thought  of  as  "effects"  of  one  variable  on  another.  This 
form  of  ANOVA  interpretation  will  prove  useful  only  when  we  can  convert  a  loglinear  model 
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into  a  logit  model,  as  we  illustrate  in  the  next  subsection. 


B.  Loglinear  models  for  IXJXK  tables 


For  a  three-way  table  of  counts,  lx  1.  the  general  loglinear  model  for  the  corresponding 

ijk 

expected  values,  (m  },  can  be  written  as: 

ijk 

logm  =u  +  u  +  u  +  u  +  u  +  u  +  u  +  u  ,  (3.6) 

®  ijk  III!  2(j)  3(k)  I2(ij)  I3(ik)  23(jk)  123<ijkl 

where,  as  in  the  usual  ANOVA  model,  all  subscripted  parameters  sum  to  zero  over  each 


subscript,  e.g. 


Zu  =  Zu  =  Zu  =  0. 

I  l(il  I  12<IJ>  i  1 234  ijk) 


In  the  special  case  where  I  =  J  =  K  =  2,  there  are  only  8  distinct  parameters:  u,  U|()),  u,  (, 
u  ,  u  ,  u  ,  u  ,  and  u  Each  of  the  7  subscripted  parameters  are  expressible  as 

3(1)  12(1  1)  13(11)  23(11)  123(111) 

a  function  of  the  ratio  of  two  cross-product  ratios.  e.g. 


1  .mm  /mm. 
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These  expressions  are  standard  ANOVA-like  contrasts  for  the  log-expected  counts.  In  an  IXJX 
K  table  each  subscripted  u-term  can  be  rewritten  as  a  linear  combination  of  the  logarithm  of 
the  ratio  of  cross-product  ratios  associated  with  the  corresponding  parameters  for  all  possible 
2X2X2  subtables. 


In  the  2X2X2  table,  setting  u  =  0  is  equivalent  to  Bartlett’s  condition  for  no-second- 

order  interaction  given  in  expression  (2.2).  In  the  IXJXK  table,  setting  uj2j(  =  0  for  all  i.  j. 
and  k  is  equivalent  to  Roy  and  Kastenbaum’s  generalization  of  Bartlett's  condition.  This  is  one 
of  four  special  cases  of  the  general  loglinear  model,  (3.6),  found  by  setting  sets  of  u-terms 
equal  to  zero: 
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each  for  all  i.  j,  and  k.  The  other  three  special  cases  each  can  be  re-expressed  so  that  an 
interpretation  in  terms  of  independence  is  possible.  Model  (d)  corresponds  to  complete 
independence  among  the  variables  for  the  three  dimensions  of  the  table.  Mode!  (c) 
corresponds  to  independence  of  variable  1  and  variables  2  and  3,  considered  jointly.  Finally 
Model  (b)  corresponds  to  conditional  independence  of  variables  1  and  2  given  the  value  of 
variable  3. 

Once  again  we  get,  in  addition  to  the  ANOVA  structure,  the  two  features  of  loglinear  models 
alluded  to  above- 

(i)  all  subscripted  parameters  are  expressible  as  logarithms  of  ratios  of  cross- 
product  ratios  or  functions  of  them, 

(ii)  several  special  cases  of  the  general  model  are  interpretable  in  terms  of 
independence  or  conditional  independence. 

To  these  features  we  can  add  a  third  when  there  is  a  distinction  between  explanatory  and 
response  variables  for  the  underlying  dimensions. 

Let  us  begin  with  the  case  of  a  2XJXK  table  in  which  the  first  variable  is  the  response,  and 
the  other  two  are  explanatory.  The  odds  of  being  in  category  1  of  the  response  variable 
versus  being  in  category  2,  given  the  levels  of  the  explanatory  variables,  are  a  natural  quantity 
of  interest  The  log-odds  can  be  expressed  in  terms  of  the  loglinear  model  parameters  simply 
by  taking  differences,  i.e. 

/  ^  \ 

log  ( — ! -&-)  =  log  m  -  log  m 

\m  /  'Jk  :Jk 

:jk 

=  2  r  u  +  u  +  u  +«  "I. 

L  HI)  IMjl  13Ukl  123<ljk)J 

Relabelling  the  u-terms  using  a  new  set  of  parameters  w  =  2u  ,  w  =  2u  ,  w  = 

or  ,(1|  ;(j)  12)  |  j)  31k) 

2u  ,  and  w  =  2u  we  get  the  logit  model: 

I3(lk>  23<jk)  1231  Ijk)  ’ 


10 


(1H  * 

— =  w  +  w  +w  +  w 

m  )  2<j>  3(k)  23 


(3.10) 


where 


Zw  =  Z  w  =  Zw  =  Zw  =0.  (3.11) 

j  2(j>  k  3<k>  J  23(jk)  k  23(jk) 

The  ANOVA-like  parameters  in  this  logit  model  are  interpretable  in  terms  of  the  "effects"  of 
the  explanatory  variables  on  the  log-odds  of  the  response.  For  example,  w  is  the 
interactive  effect  of  variables  2  and  3  on  the  log-odds  when  variable  2  is  at  level  j  and 
variable  3  is  at  level  k  over  and  above  the  separate  effects  for  vartabl-  i  and  3.  Note  that 
none  of  the  u-terms  in  the  loglinear  model  involving  only  the  explanato  .enables  are  present 
in  the  logit  version  of  the  model. 


For  an  IXJXK  table  in  which  the  first  variable  is  the  response,  the  loglinear  model  of 

expression  (3.6)  can  be  rewritten  as  a  set  of  1-1  logit  models  for  the  log-odds, 

.m  . 

log  /-uk- \  i  *  1.  2 .  1-1. 

mijk 

with  each  logit  model  being  of  the  form  of  expressions  (3.10)  and  (3.11).  If  we  use  a 

transformation  other  ihan  logarithmic  for  the  odds  in  (3.12),  then  we  get  other  members  of 
Nelder  and  Wedderburn’s  GLIM  family.  For  example,  the  probit  or  integrated  normal  scale  is 

I’m  /(m  +  m  )1 

L  'jk  Ijk  Ijk  J 

where  ♦*'(•)  is  the  inverse  of  the  cumulative  normal  c.d.f.  Among  the  members  of  the  GLIM 
family,  only  the  logit  (or  loglinear)  model  includes  as  special  cases  the  models  that  are 

interpretable  in  terms  of  independence  and  conditional  independence  of  the  underlying 

variables. 


Even  in  the  absence  of  the  statistical  estimation  results  in  the  following  subsection,  the 
interpretability  of  loglinear  models  makes  them  an  ideal  candidate  for  the  basis  of  a 
comprehensive  approach  to  the  analysis  of  categorical  data. 

C.  Key  results  from  likelihood  theory 

There  are  three  standard  sampling  models  for  the  observed  counts  in  contingency  tables.  We 


begin  by  describing  them  for  a  singly  subscripted  vector  of  t  cells,  xT  =  (x^  .  x  ).  This 

notation  for  an  arbitrarily  structured  collection  of  t  cells  will  prove  to  be  of  great  use  in  the 
non-contingency-table  problems  described  in  the  next  section  of  the  paper.  For  the  2X2  table 

t  =  4,  and  for  the  general  three-way  table  t  =  IJK.  Now  let  mT  =  (m  ,  .  m  )  be  the 

vector  of  expected  values  that  are  assumed  to  be  functions  of  unknown  parameters  8J  =  (ff  . 
d __ .  d  ),  where  s  <  L  Thus  we  can  write  m  =  m(0).  The  three  sampling  models  are: 

POISSON  MODEL.  The  fx  >  are  observations  from  independent  Poisson  random 

l 

variables  with  means  (m  }  and  likelihood  function 

i 

£m  '  exp(-m)/xlj.  (3.12) 

MULTINOMIAL  MODEL.  The  total  count  N  =  S‘  x  is  a  random  sample  from  an 

i-i  i 

infinite  population  where  the  underlying  cell  probabilities  are  (m/Nl,  and  the 

i 

likelihood  is 

N!-N"n  nl  (m*'/x !).  (3.13) 

1"  1  I  I 

PRODUCT-MULTINOMIAL  MODEL.  The  cells  are  partitioned  into  sets,  and  each 

set  has  an  independent  multinomial  structure,  as  in  the  multinomial  model. 

Loglinear  models  in  this  setting  come  about  by  representing  the  vector  of  log  expectations 

XT  =  (log  m| .  log  m  )  as  a  linear  combination  of  this  parameter  in  the  vector  6.  The 

following  pair  of  results  now  follow  directly  t,om  exponential  family  theory  for  the  Poisson 
and  multinomial  sampling  schemes. 

RESULT  1.  Corresponding  to  each  parameter  in  6  is  a  minimal  sufficient  statistics 

(MSS)  that  is  expressible  as  a  linear  combination  of  the  (x  }.  (More  formally,  if  M 

i 

is  used  to  denote  the  loglinear  model  specified  by  m  =  m(0),  then  the  MSS's  are 

given  by  the  projection  of  x  onto  M,  P^x. 

RESULT  2.  The  maximum  likelihood  estimate  (MLE),  m  .  of  m.  if  it  exists,  is 
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unique  and  satisfies  the  likelihood  equations 

PMm=  PMx.  (3.14) 

Necessary  and  sufficient  conditions  for  the  existence  of  a  solution  to  the  likelihood  equations 
of  expression  (3.14)  are  given  in  Haberman  (1974).  and  for  various  special  cases  by  a  variety  of 
authors.  Nonexistence  occurs  when  the  likelihood  is  maximized  on  the  boundary  of  the 
parameter  space,  and  this  corresponds  to  some  m's  being  equal  to  zero.  Although  deriving 
constructive  conditions  for  the  existence  of  MLE’s  has  been  viewed  by  many  as  an  esoteric 
research  problem,  in  fact  it  is  a  critical  component  to  computational  methods  for  solving  the 
likelihood  equations  and  is  of  great  practical  import  for  those  who  wish  to  analyze  large  sparse 
contingency  tables. 

We  now  come  to  the  third  key  result,  which  was  first  given  by  Birch  (1963)  and  which 
unifies  the  three  sampling  schemes  and  links  the  MLE’s  for  loglinear  and  logit  models.  For 
product-multinomial  sampling  situations,  the  basic  multinomial  constraints  (i.e..  that  the  counts 
must  add  up  to  -the  multinomial  sample  sizes)  must  be  taken  into  account.  One  way  to  think 
about  this  in  the  context  of  loglinear  models  is  to  recall  that,  from  Result  1.  these  sample 
sizes  are  marginal  totals,  which  under  a  simple  multinomial  or  Poisson  model  are  MSS's 

corresponding  to  some  of  the  parameters  in  8  specifying  the  loglinear  model  M,  i.e.,  m  =  m(0) 
are  fixed  by  these  constraints.  What  we  do  is  consider  a  logit  model,  M\  where  these 

components  of  8  "drop  out." 

More  formally,  let  M*  be  a  logit  model  for  m  under  product-multinomial  sampling  which 
corresponds  to  a  loglinear  model  M  under  Poisson  sampling  such  that  the  multinomial 

constraints  "fix"  a  subset  of  the  parameters.  9,  used  to  specify  M.  Then  Birch's  result  is: 

RESULT  3.  The  MLE  of  m  under  product-multinomial  sampling  for  the  model  M* 
is  the  same  as  the  MLE  of  m  under  Poisson  sampling  for  the  model  M. 


This  result  is  directly  related  to  a  more  general  theorem  from  exponential  family  theory 


which  states  that,  if  we  have  an  exponential  family  density  in  minimal  form  with  MSS’s  h^  hv 
....  h .  then  the  conditional  density  for  h  ,  h .  h  given  h  ,  h  .  h  has  the  same 

»  1  2  k  k*l  k*2  s 

exponential  family  form.  Moreover,  the  exponential  family  parameters  for  this  conditional 

density  are  the  ones  from  the  original  density  for  which  h(,  h^ .  h^  are  MSS’s,  and  h  .  h^. 

....  hk  are  the  corresponding  MSS's  in  the  conditional  density  (see  Andersen.  1980,  pp.82-83  for 
a  formal  statement  and  proof).  What  is  so  special  about  Result  3,  the  loglinear/multinomial 
version  of  this  theorem  especially  in  the  context  of  contingency  tables,  is  that  the  MSS's,  P^x. 
are  marginal  totals  for  the  original  vector,  x.  and  thus  the  conditional  density  has  the 
minimal  form  for  the  conditional  distribution  of  the  response  variables  given  a  set  of 
explanatory  variables ,  i.e.  given  the  cross- classification  of  which  is  fixed  by  the  product 
multinomial  sampling  scheme.  This  unique  feature  of  loglinear  models  and  their  associated 
sampling  schemes  distinguishes  them  from  other  forms  of  linear  models.  For  example,  in 
standard  linear  model  theory  with  normal  error  terms  one  cannot  change  one  set  of  linear 
model  results  into  another  by  conditioning  on  marginal  totals  unless  one  is  working  with  a 
completely  balanced  factorial  design. 

To  illustrate  these  ideas  we  return  to  the  2X2X2  table,  and  the  no  second-order  interaction 
model  with  u  =  0  for  i,  j,  k  =  1.  2.  For  the  Poisson  or  multinomial  sampling  schemes. 

123lijk) 

the  MSS’s  of  Result  l  are  the  two-dimensional  marginal  totals,  tx  },  (x  },  and  {x  } 

ij*  i*k  *ji. 

(except  for  linearly  redundant  statistics  included  for  purposes  of  symmetry).  Using  Result  2, 
we  have  that  the  MLE’s  of  the  (m^).  if  they  exist,  must  satisfy  the  likelihood  equations. 


m 

=  x  . 

U  =  1.2. 

•j* 

ij* 

m 

=  x  , 

i.k  =  1,2, 

(3.15) 

i*k 

i*k 

m 

=  x  . 

j.k  =  1.2. 

*jk 

*jk 

If  the  second  and  third  dimensions 

correspond 

to  explanatory  variables  with 

(x  )  fixed  bv 

design,  then  we  have  a  product  multinomial  sampling  scheme  and  the  relevant  logit  model  sets 
w  =  0  for  j  =  1,2,  and  k  =  1,2.  The  MSS’s  are  now  (x  )  and  (x  )  and  the  likelihood 

23<jk)  i  .r  i*k 

equations  are  still  given  by  (3.15),  since  the  third  set  of  equations  simply  represent  the 
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sampling  constraints. 

D.  Computing  MLE’s  for  multi-way  tables 

As  we  mentioned  in  Section  2,  for  many  of  the  special  versions  of  loglinear  models  such  as 
no-second-order  interaction  in  three-way  tables,  we  need  to  solve  the  likelihood  equations  in 
expression  (3.14)  using  some  type  of  iterative  procedure.  The  two  main  competitors  are  the 
Iterative  Proportional  Fitting  Procedure  (IPFP,  e.g.  see  Bishop,  Fienberg,  and  Holland.  1975), 
which  has  linear  convergence  properties,  and  Newton's  method  (or  related  quadratic  convergence 
algorithms  such  as  the  one  used  in  the  GLIM  package).  For  a  discussion  of  advantages  and 
disadvantages  of  each  of  these  methods,  and  the  possibility  of  using  hybrid  algorithms,  see 
Fienberg  and  Meyer  (1983). 

The  IPFP  algorithm  as  implemented  in  the  BMDP  package  uses  a  parametrization  for 
loglinear  and  logit  models  different  from  the  parametrization  in  the  version  of  Newton’s 
method  used  by  the  GLIM  package.  Both  packages  will,  however,  produce  the  same  estimated 
expected  values  satisfying  the  likelihood  equations.  What  is  needed  both  here,  and  in  the 
context  of  linear  models  more  generally,  is  flexible  software  that  can  convert  from  one 
parametrization  to  the  other  with  minimal  effort  on  the  part  of  the  user.  The  technology  for 
doing  this  already  exists.  What  we  need  to  do  as  statisticians  is  remember  that  the  form  of 
parametrization  or  the  choice  of  a  basis  for  interpreting  linear  models  need  not  necessarily  be 
the  same  as  the  parametrization  or  basis  actually  used  for  doing  the  computation.  All  too 
often  we  let  interpretation  drive  computation  or  vice  versa.  This  need  not  happen. 

4.  Flexibility  of  the  Loglinear  Model  Approach 

The  likelihood  results  of  Section  3C  are  quite  general  and  apply  to  large  numbers  of 
categorical  data  problems  other  than  those  where  the  parameters  in  the  model  are  directly 
associated  with  the  dimensions  of  a  complete  multi-way  contingency  table.  Before  the  general 
results  had  been  derived,  statisticians  had  often  approached  each  special  problem  as  a  separate 
enterprise,  sometimes  using  loglinear  models  and  sometimes  not.  For  example,  the  entire 


literature  on  paired-comparisons  (David,  1963),  and  the  Bradley-Terry  model  (Bradley  and 
Terry,  1952)  and  its  generalizations  in  particular,  was  developed  without  reference  to  loglinear 
or  logit  models  per  se.  Cox  (1970)  took  special  note  of  the  logistic  form  of  the  Bradley-Terry 
model  in  his  book  on  the  analysis  of  binary  data,  and  formal  links  to  the  loglinear  model 
theory  and  literature  appeared  in  Imrey,  Johnson,  and  Koch  (1976),  Fienberg  and  Larntz  (1976), 
and  Fienberg  (1979).  Other  examples  of  where  categorical  data  problems  have  been 
restructured  and  analyzed  directly  using  loglinear  model  techniques  include  capture-recapture 
analysis  (Fienberg.  1972).  latent  structure  analysis  (Goodman,  1974),  Guttman  scaling  (Goodman. 
1975).  and  Milgram’s  small  world  problem  (Fienberg  and  Lee.  1975). 

Three  other  topics  that  have  recently  been  linked  to  the  loglinear  model  literature  are  (a)  the 
analysis  of  censored  survival  data,  (b)  the  analysis  of  social  ana  other  network  data,  and  (c) 
the  analysis  of  survey  and  intelligence  test  data  using  the  Rasch  model.  We  discuss  each  in 
turn,  and  provide  some  relevant  references. 


For  the  analysis  of  survival  data  interest  often  focuses  on  the  form  of  the  hazard  function 

h(t,x)  =  f(Jx)/[l  -  F(tjx)]  (4.1) 

where  f(tjx)  and  F(t|x)  are  the  pdf  and  cdf  at  time  t  given  x.  an  associated  set  of  fixed 
covariates.  Cox  (1972)  introduced  a  proportional  hazards  model  of  the  form 

h(tjx)  =  ho(t)  •  ex  ^  .  (4.2) 

and  much  of  the  discussion  by  Cox  and  others  (such  as  Breslow  in  the  formal  discussion 
following  Cox’s  paper)  made  reference  to  the  links  between  the  analysis  of  expression  (4.2)  and 
the  categorical  data  literature  (e.g.  through  Mantel-Haenszel  tests).  A  more  formal  linkage  is 
possible  especially  in  the  case  where  the  covariates,  x,  are  categorical  and  the  underlying 
hazard  function,  h  (t),  is  piecewise  constant  (see  Holford  1976.  1980).  In  this  case  not  only  is 

G 

the  hazard  function  loglinear,  but  so  is  the  likelihood1  after  using  a  transformation  to  an 
"equivalent”  Poisson  sampling  model  based  on  an  extension  of  Birch’s  (1963)  result  (Laird  and 


Acluall>  it  is  an  affine  translation  ot  a  loclincar  model  likelihood 
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Oliver,  1981).  The  latter  authors  then  show  how  to  estimate  fi  in  this  special  case  of  Cox’s 
model  using  IPFP.  Related  results  have  been  alluded  to  somewhat  less  directly  by  Aitkin  and 
Clayton  (1980)  and  Whitehead  (1980)  who  explain  how  to  use  GLIM  to  estimate  censored 
survival  data.  An  important  feature  of  this  result  is  the  ease  with  which  it  generalizes  to 
other  related  survival  problems  such  as  those  involving  competing  risks. 


A  directed  graph  consists  of  a  set  of  g  nodes,  and  a  collection  of  directed  arcs  connecting 
pairs  of  nodes.  Such  graphs  have  been  used  to  depict  social  networks  describing  relationships 
between  pairs  of  individual  actors.  Let  y  be  a  sociomatrix  or  adjacency  matrix  with 
elements 


if  a  directed  arc  goes  from  i  to  j 
otherwise. 


(4.3) 


where  by  convention,  the  diagonal  terms  y  =  0.  Holland  and  Leinhardt  (1981)  note  that  for 

it 

any  pair  or  dyad  in  a  network,  with  adjacency  matrix  y, 


y  y  +  y  (1-y  )  +  (1-y  )  y  +  (l-y  )(i-y  )  =  l  .  (4.4) 

U  JI  'J  J'  I)  JI  U  J' 

for  i  =  j.  and  that  exactly  one  of  the  terms  on  the  left  hand  side  of  (4.9)  is  1  and  the 
remaining  three  are  0.  They  then  suggest  the  following  model  to  describe  these  outcomes 
(using  Y  as  the  matrix  of  random  variables  of  which  the  adjacency  matrix  y  is  a  realization): 


log  Pr  [  (1— Y  )(1-Y  )  =  1]  =  X 

1J  J'  »J 

log  PrTd-Y  )Y  =  1]  (4.5) 

•j  j*  *j  j  * 

log  Pr[Y  (1-Y  )  =  1]  =  X  +  a  +  fi  ♦  6 

»j  j'  ij  t  j 

log  Pr[Y  Y  =1]  =  X  +  2d  +  p  , 

■j  ji  'j  '  j  i  j 


where  the  (X  }  are  "dyadic”  effects  included  here  (but  onlv  implicitly  in  Holland  and 

ij 

Leinhardt)  to  assure  that  the  multinomial  constraint  (4.4)  is  satisfied,  and  where 


(4.6) 


If  we  assume  that  the  dyads  are  independent,  then  we  have  a  product-multinomial  sampling 
model  with  one  observation  per  multinomial.  Holland  and  Leinhardt  make  direct  use  of 
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exponential  family  theory  results  on  maximum  likelihood  estimation  to  estimate  the  parameters 
in  (4.5).  Fienberg  and  Wasserman  (1981a.  1981b)  note,  however,  that  there  is  a  link  between 


their  model  and  a  loglinear  model  for  a  multi-dimensional  table  representation  of  the 
probabilities  in  (4.5).  In  particular,  they  work  with  the  four-dimensional  array: 


Note  that  X  =  X  .  because  the  dyad  (i,j)  is  the  same  as  the  dyad  (j,i).  By  using  this 

ijks  jisk  ' 

redundant  representation,  we  get  a  contingency  table  analogue  to  the  Holland-Leinhardt  model. 

In  particular,  Meyer  (1982)  shows  that  fitting  their  model  via  maximum  likelihood  to  y  =  (y  } 

>j 

is  equivalent  to  fitting  a  loglinear  model  to  the  newly  created  redundant  array  (x  ).  i.e.  the 

i  jk* 

model  of  no-second-order  interaction. 


What  is  especially  attractive  about  the  multi-dimensional  contingency  table  representation  of 
the  social  network  data  problem  as  outlined  here  is  that  it  generalizes  to  extensions  of  the 
Holland /Leinhardt  model  (Fienberg  and  Wasserman,  1981a,  1981b)  and  it  carries  over  to 
networks  involving  multiple  relationships.  For  further  details,  see  Fienberg.  Meyer,  and 
Wasserman  (1981,  1983). 


The  final  topic  of  this  section  also  begins  with  one  categorical  data  representation  and  ends 
up  with  a  different  but  familiar  loglinear  representation  for  a  multiway  table.  The  results  of 
ability  tests  are  often  structured  in  the  form  of  sequences  of  l's  for  correct  answers  and  0’s 
for  incorrect  answers.  For  a  test  with  k  problems  or  items  administered  to  n  individuals,  we 
let 


Y  = 

ij 


if  individual  i  answers  item  j  correctly 
otherwise. 


(4.8) 


Thus  we  have  a  two-way  table  of  random  variables  (Y  1  with  realizations  {v  ),  An 

'j  v  'j 


alternative  representation  of  the  data  is  in  the  form  of  a  nx2L  table  {W  }  where  the 

'VV  4 

subscript  i  still  indexes  individuals  and  now  j  . j  refer  to  the  correctness  of  the  responses 

on  items  1,2 . k.  respectively,  i.e. 


W 


IJ,J: 


i  if  i  responds  (j1.j,.—j ) 

0  otherwise. 


(4.9) 


The  Rasch  model  (Rasch,  1960  as  reprinted  in  1980)  for  the  {Y  )  is 


P(Y  =1) 

log  p(y'~l=o)  =  Y  +  ^  *  v,  '  (410) 

where 

lu  =  lv  =  0  .  (4.11) 

i  j 

Expression  (4.10)  is  a  logit  model  in  the  usual  contingency  table  sense  for  a  3-dimensional 

array  whose  first  layer  is  {y  }  and  whose  marginal  totals  adding  across  layers  is  an  nxk  table 

•j 

of  l's. 


Maximum  likelihood  estimation  for  the  parameters  of  the  Rasch  model  (4.10)  has  been  the 
focus  of  several  authors  including  Rasch  and  Andersen.  Unconditional  maximum  likelihood 
(UML)  estimates  can  be  derived  but  they  have  ratheT  problematic  asymptotic  properties.  e.g. 
the  estimates  are  inconsistent  as  n  ->  oo  and  k  remains  moderate,  although  they  are  consistent 
when  both  n  and  k  -»  oo  (Haberman,  1977).  Fischer  (1981)  provided  an  interesting  link  to  the 
loglinear  model  literature  by  approach  UML  estimation  via  the  embedding  of  the  matrix  y  = 
(y..)  into  a  larger  (n*k)x(n+k)  matrix  of  the  form: 

0  eT-yT 

(4.12) 

y  0 

where  e  is  an  nxk  matrix  of  l’s.  Then  he  notes  that  the  Rasch  model  of  (4.10)  is 
transformed  into  an  incomplete  version  of  the  Bradley-Terry  model  discussed  at  the  beginning 


of  this  section. 
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Now,  we  turn  to  a  conditional  approach  to  likelihood  estimation  (CML)  advocated  initially  by 
Rasch.  who  noted  that  the  conditional  distribution  of  Y  given  the  individual  marginal  totals 
{y  =  y  )  depends  only  on  the  item  parameters.  {*  ).  Then  each  of  the  row  sums  {y  }  can 
take  only  k+1  distinct  values  corresponds  to  the  number  of  correct  responses.  Next,  we  recall 
the  alternate  representation  of  the  data  in  the  form  of  an  nx2k  array.  {W  }.  as  given  by 

"l ‘ 2  \ 

expression  (4.9).  Adding  across  individuals  we  create  a  2k  contingency  table.  X.  with  entries 

X  =  W  (4.13) 

J.  *),.),  K 


Duncan  (1983)  and  Tjur  (1982)  independently  noted  that  we  can  estimate  the  item  parameters 
for  the  Rasch  model  of  (4.10)  using  the  2k  array  x.  and  the  loglinear  model 


log  m 

where  the  subscript  j  =  Xk  j .  5 

♦  s-l  %  t 


V’J3 


r  & 

»•!  J 


=  1  if  j  =1  and  is  0  otherwise,  and 

\ 


(4.14) 


Ik  v  =  0  .  (4.15) 

r-o'r 

The  amazing  result,  due  to  Tjur  (1982).  is  that  maximum  likelihood  estimation  of  the  2k 
contingency  table  of  expected  values,  m  =  (m  }  using  a  Poisson  sampling  scheme  and  the 

V:  \ 

loglinear  model  (4.10).  produces  the  conditional  maximum  likelihood  estimates  of  { »-  }  for  the 
original  Rasch  model.  Tjur  proves  this  equivalence  by  (1)  assuming  that  the  individual 
parameters  are  independent  identically  distributed  random  variables  from  some  completely 

unknown  distribution,  n\  (2)  integrating  the  conditional  distribution  of  Y  given  (Y  =y  }  over 

!♦  »♦ 

the  mixing  distribution,  n:  (3)  embedding  this  "random  effects"  model  in  an  "extended  random 
model";  and  (4)  noting  that  the  likelihood  for  the  extended  model  is  equivalent  to  that  for 
(4.10)  applied  to  x  (using  Result  3  of  Section  3  above).  Fienberg  (1981)  then  noted  that  the 
model  of  (4.14)  is  the  model  of  quasi-symmetry  preserving  one-dimensional  marginal  totals, 
first  proposed  by  Bishop,  Fienberg.  and  Holland  (1975.  Chapter  8). 


Cressie  and  Holland  (1983)  have  independently  developed  an  approach  to  the  Rasch  model 
similar  to  Tjur’s  and  Duncan’s,  and  they  note  other  interesting  linkages  to  other  aspects  of 


latent  trait  models. 
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5.  Speculation  on  Future  Developments 

The  early  parts  of  this  paper  represent  a  look  backwards  upon  the  development  of  the 

methodology  for  the  loglinear  model  analysis  of  categorical  data.  Section  4  is  a  brief 

examination  of  the  recent  past  and  the  present,  stressing  how  researchers  have  been  effectively 
adapting  the  loglinear  model  approach  to  new  non-standard  categorical  data  problems.  In 

keeping  with  the  spirit  of  the  Conference,  I  have  been  allowed,  indeed  encouraged,  to  end  with 

some  speculation  on  where  we  go  from  here. 

The  easy  part  of  speculating  on  the  future  is  to  prepare  a  list  of  work  now  underway  or 
work  that  may  begin  quite  soon: 

(a)  Ordinal  Variables.  Many  papers  have  been  written  in  recent  years  extending  the 
comprehensive  loglinear  approach  to  problems  involving  ordinal  variables  (e.g.  Agresti, 
1982.  or  Fienberg,  1982b).  A  subtle  problem  described  in  passing  in  these  papers  is 
the  use  of  monotonicity  constraints  on  loglinear  parameters  to  reflect  the  ordinal 
structure.  There  is  also  a  dual  problem  involving  monotonicity  constraints  on 
marginal  totals.  The  computation  of  MLE’s  for  such  models  requires  attention  as 
does  the  issue  of  assessing  good ness-of -fit. 

(b)  Two  Problems  in  the  Analysis  of  Network  Data.  A  problem  we  skimmed  by  in 
Section  4  is  the  lack  of  relevance  of  standard  asymptotic  theory  for  the  loglinear 
model  for  network  data.  The  4- way  array  used  above  is  of  size  4g:,  with  a  total 
count  of  2g(g-l).  while  the  Holland-Leinhardt  model  has  2g  parameters.  Haberman 
(1981)  gives  some  relevant  asymptotics  for  this  problem,  but  more  attention  is 
required  before  the  distribution  of  goodness-of-fit  statistics  is  in  hand.  A  second 
vexing  problem  for  network  data  is  the  assumption  of  dyadic  independence.  What  is 
needed  is  the  formulation  of  a  less  restrictive  model  allowing  for  dyadic  dependence, 
which  includes  the  Holland-Leinhardt  model  as  a  special  case. 

(c)  Logistic  Regression  Diagnostics.  The  logistic  regression  model  is  a  simple 
extension  of  the  logit  model  of  Section  3  where  the  explanatory  variables  are 


continuous  rather  than  categorical  (see  Fienberg,  1980,  Chapter  6).  One  way  to  view 
such  models  is  as  corresponding  to  very  sparse  cross-classifications,  and  thus  it  comes 
as  no  surprise  that  the  usual  asymptotic  theory  for  overall  goodness-of-fit  statistics  is 
inapplicable.  Landwehr,  Pregibon,  and  Shoemaker  (1983)  give  some  interesting 
graphical  devices  for  logistic  regression  diagnostics  to  help  with  assessing  goodness-of- 
fit.  More  attention  to  this  problem  is  needed. 

(d)  Computation.  Although  computer  programs  for  fitting  loglinear  and  logit  models 
are  now  widely  available,  many  of  the  more  interesting  applications  involve  very 
large,  sparse  arrays  that  do  not  fit  easily  into  core  in  most  modern  computers.  Two 
directions  of  research  on  computations  for  categorical  data  will  include  (1)  the 
development  of  programs  for  personal  computers  that  make  effective  use  of  disk  and 
auxiliary  storage  space,  and  (2)  the  development  of  programs  for  array  processors  that 
make  effective  use  of  parallel  algorithms. 

(e)  Bayesian  Approaches.  Many  papers  have  been  written  on  Bayesian  approaches  to 
the  analysis  of  categorical  data  using  loglinear  and  logit  models.  No  one  has  yet  to 
describe  an  easily  implementable  Bayesian  approach  for  large  multi-way  tables. 

(f)  Applications.  To  date  the  applications  of  the  loglinear  model  methodology  have 
occurred  primarily  in  the  biological,  medical,  and  social  sciences.  I  believe  we  can 
look  forward  to  new  applications  in  agriculture  and  in  industrial  settings.  In 
particular,  I  foresee  the  use  of  loglinear  model  methodology  in  the  development  of 
multivariate  quality  control  techniques. 

It  is  more  difficult  to  look  further  ahead  into  the  future.  Now  that  we  have  reached  the 
stage  where  we  have  developed  a  comprehensive  approach  to  categorical  data  analysis,  that 
parallels  the  linear  model  theory  for  measurement  data,  1  believe  we  need  to  step  back.  As 
useful  as  the  loglinear  model  approach  has  proved  to  be,  it  is  all  too  easy  to  misinterpret 
loglinear  model  parameters  by  imparting  inappropriate  causal  interpretations.  1  see  little  hope 
for  a  "grand  unified  theory"  applicable  to  all  problems  in  all  settings.  The  key  to 
understanding  longitudinal  processes,  for  example,  is  the  development  of  formal  stochastic 


models  and  their  application  to  observed  data.  When  the  data  are  categorical  and  are 
measured  at  several  fixed  points  in  time  the  issue  should  be:  how  well  does  the  underlying 
stochastic  model  fit.  Instead  we  tend  to  fit  loglinear  or  other  off-the-shelf  models  to  the 
resulting  cross-classifications,  and  then  to  make  loose  interpretations  about  the  "ideas”  in  the 
stochastic  model.  More  careful  attention  to  such  problems  (e.g.  see  Cohen  and  Singer.  1979, 
Singer  and  Cohen,  1980,  and  Singer,  1981)  may  bear  far  more  interesting  results,  and  will 
certainly  generate  difficult  statistical  problems  in  need  of  solution.  Thus,  for  me.  the  most 
promising  direction  for  statistical  research  on  categorical  data  analysis  is  away  from  the 
comprehensive  approach  described  in  this  paper. 
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three  basic  themes:  interpretability ,  flexibility,  and  computability.  This 
paper  surveys  the  evolution  of  this  comprehensive  approach  and  classes  of 
problems  for  which  ithas  proven  useful.  A  concluding  section  contains  some 
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